Skip to contents

Intro

This vignette shows how to run a Kernel Density Estimate (KDE) and Hotspot analysis using the data retrieved using replicaToolkitR package. The goal of this document is not to present a complete analysis but to showcase how easily it is to implement such analyses using the data products created by replicaToolkitR from Replica.

NOTE: This vignette uses data that comes from replicaToolkitR, the user should have a precursory understanding of how to acquire this data themselves using replicaToolkitR::query_network_trip_using_bbox().

Set-Up

Here are the packages you’ll need for this vignette.

Data Overview

The data used in this analysis was acquired using the study area bounding box seen below.

Note: Any trip travelling though this zone is queried via replicaToolkitR. This is done to account for all vehicles traveling on study area network links. The full list of trips queried all listed below:

  • internal-to-internal: study area to study area trips
  • internal-to-external: study area to external trips
  • external-to-internal: external to study area trips
  • external-to-external: external to external trips (through trips)

replicaToolkitR, however; filters for trips that only originate within the study area polygon when displaying or aggregating anything regarding trip origins.

The below map details trip origins as counts aggregated by each trips first network link. Links in this case are displayed as points determined by starting coordinates. Normally, this data would be loaded from the replica_trip_origin_links.gpkg file the user creates when querying Google but in this instance it is loaded from the package.

data("replica_trip_origin_links_gpkg")

replica_trip_origin_links_fltrd = replica_trip_origin_links_gpkg %>%  
  st_filter(study_area_network) %>%  
  st_transform(crs = 32610)

KDE and Hotspot

Data Prep

The SpatialKDE and sfhotspot packages are used to perform the KDE analysis. The latter is built upon the former and provides some convenience wrappers if you don’t want to get into get into the nitty-gritty of some of the function options. Both are used in this vignette.

First, create a grid that covers the spread of my data. In this case, a cell size of 200 meters was chosen as it is roughly the size of a city blocks in Seattle and a good unit for this analysis.

grid_sm = create_grid_hexagonal(
  replica_trip_origin_links_fltrd
  ,cell_size = 200)

The water around the city is then removed using the tigris package.

This is an important step, as it make the distinction between a location that does not generate any trips and one incapable of generating trips - i.e. freight trips cannot start in the middle of Lake Washington.

grid_sm_water_removed = tigris::erase_water(grid_sm) %>%
  st_cast("POLYGON")

Both grids are displayed in the below map. The grid with removed water features is seen in the right panel:

As you can see it is not perfect; Green Lake, Union Bay and other water features have not been removed. In addition, this idea can be extended to other locations incapable of generating trips such as parks, universities, airport tarmac, etc.

The R package mapedit provides functions that allow the user to easily draw and create their own polygons can be used to remove non-trip generating similar to the water removal above. This vignette does not cover this.

Analysis Execution

The below code runs the **hotspot_kde()** function for different vehicle types in the data and using the different grids previously created. The **purrr::pmap()** function allows us to iterate over multiple arguments simultaneously resulting in a list object (and saves us a bunch of key strokes!).

hotspot_obeject = list(
  list("MEDIUM_COMMERCIAL", "MEDIUM_COMMERCIAL", "HEAVY_COMMERCIAL", "HEAVY_COMMERCIAL")
  ,list(grid_sm_water_removed, grid_sm, grid_sm_water_removed, grid_sm)
) %>%
  pmap(~{
    temp_hotspot = replica_trip_origin_links_fltrd %>%
      filter(vehicle_type == .x) %>%
      hotspot_kde(
        data = .
        ,grid = .y
        ,weights = count
        ,bandwidth = 800) %>%
      mutate(kde_norm = gauntlet::normalize_min_max(kde)) %>%
      mutate(label = str_glue("KDE (Min-Max Norm): {dgt2(kde_norm)}<br>KDE (raw): {dgt2(kde)}<br>TTL Trips: {sum}")) %>%
      st_transform(crs = 4326)
  })

The resulting 2D kernel density estimates for trip origins of heavy (left panel) and medium (right panel) duty vehicles are displayed below:

The results are pretty interesting - their is a rather dramatic difference between where heavy and medium duty trucks go in Seattle. Heavy duty trucks stick to the ports, where medium duty tricks are more distributed through out Seattle (they’re smaller and can get around easier, makes sense!).

This analysis can be extended by using hotspot_gistar() function, which allows us to easily calculate the gi-star Z-score statistic used in identifying clusters of point locations. The theory behind this statistic will not be covered here but should be reviewed before implementing it.

temp_hotspot = replica_trip_origin_links_fltrd %>%
      filter(vehicle_type == "MEDIUM_COMMERCIAL") %>%
      hotspot_gistar(
        data = .
        ,grid = grid_sm_water_removed
        ,weights = count
        ,bandwidth = 800)

The map below depicts the result of calculating the gi-star Z-score statistic. Statistically significant hot and cold locations are displayed:

temp_hotspot %>%
  filter(pvalue <= .01) %>%
  mutate(flag_gistar = case_when(gistar > 0~"Hot", T~"Cold")) %>%
  mapview(layer.name = "Hotspots", zcol = "flag_gistar")

By just using the default input options, we get some pretty interesting findings. We see rather large cold spots in Green Lake, Union Bay, Discovery Park (west of Magnolia) - this makes sense as the first two locations are bodies of water and the third is a large wooded park where freight trucks don’t go. In comparison to their neighbors, these locations are especially cold regions for trucks and this identified. One could take these findings and remove these areas in subsequent analysis iterations.

Considerations

This was a very simple vignette intended to highlight a specific workflow stemming from data acquired from the replicaToolkitR package.

For a more detailed analysis, one should consider:

  • full set of locations that should be excluded from the analysis
    • this will help prevent identifying spurious hot/cold locations
  • the proper unit of measure for your grid
  • the bandwidth used in the KDE and gistar calculations
  • the kernel type used in the KDE and gistar calculations
    • the default quadratic kernel was used but their are others that can be specified
  • the neighbor distance used in the gistar calculations
    • this impacts whether a cell will statistically varies from its neighbors
    • altering this will impact the size and shape of hot and cold spots ***